Identification of mine water sources using a multi-dimensional ion-causative nonlinear algorithmic model

Based on the nonlinear algorithmic theory, the R-SVM water source discrimination model and prediction method were established by using the piper qualitatively to compare the differences between the ionic components and R-type factor approximation indicator input dimensions. Taking the mine water samples of Zhaogezhuang Coal Mine as an example, according to the chemical composition analysis of the water samples from different monitoring points, six indexes of Na+, Ca2+, Mg2+, Cl–, SO42– and HCO3– were selected as the discrimination factors. According to the water characteristics of each aquifer and the actual needs of discrimination, the water inrush sources in the mining area were divided into four categories: The goaf water is class I, Ordovician carbonate is class II, Sandstone fracture water from the 13 coal system is class III, and Sandstone fracture water from the 12 coal system is class IV. Taking 56 typical water inrush samples as training samples, 11 groups for prediction samples, establish the input index as typical ion content, output as water source type, using SPSS statistics and MATLAB to realize the R-SVM water source discriminant analysis model, automatically establishing the mapping relationship between the water quality indexes and the evaluation standards, which can achieve the purpose of rapid and accurate discrimination of the water sample data. The results showed that the accuracy of the R-SVM model classification was 90.90% in the verification of the water source discrimination example of Zhaogezhuang mine and the coupled model has high accuracy, good applicability and discriminant ability, and has certain guiding significance for the prevention and control of water damage and the related field work.


Principle of R-factor dimensionality reduction
There are m test variables Z i (i = 1, 2, 3, • • • , m) , which may be correlated, and each Z i contains independently existing common factor f j j = 1, 2, • • • , p , P ≤ m where Z i contains m mutually uncorrelated unique factors u1, u2, u3, • • • , um , and u and f are mutually uncorrelated.Each Z can be linearly characterized by f and u as 18 : Expressed as matrix: Abbreviated as: The factor analysis method lies in replacing Z by F through Eqs. ( 2) and (3), conditioned on p < m , which can streamline the number of dimensions to reduce redundancy.The specific steps are 19 : (1) Construct sample matrix and perform correlation test, Collect the p-dimensional random variable X = (x 1 , x 2 , • • • x p ) T and construct the sample matrix: (1) . . . . . . . . . . . . .
The KMO or Bartlett test was used to test the correlation of variables, and if the correlation coefficient is less than 0.3, there is no sense of dimensionality reduction.If the correlation is strong means that the commonality of variables can be extracted and is suitable for factor analysis.
(2) Processing to obtain the standardized matrix, The standardization is done through the following: The standardized matrix is obtained: (3) Calculate the correlation matrix, The correlation coefficient matrix is obtained as follows: In addition, The correlation calculation is performed on the standardized matrix Z.The eigenvector values of |R − I P | = 0 are obtained based on the features of the correlation matrix, and then the common factors are extracted using the above approach, making the information utilization rate cover more than 85%.
(4) Calculate the factor load matrix, rotate the load matrix, and obtain the matrix U, u i Principal component vector of the i sample.u ij Projection of the vector on the unit eigenvector.

Support vector machine principle
Support Vector Machine simplifies complex problems by establishing nonlinear mapping relationships is good at dealing with nonlinear complex systems, and automatically establishes the mapping relationship between water quality indicators and evaluation criteria by performing inner product operations in the transformation space to achieve the purpose of effectively classifying the categories to which the predicted samples belong.The principle is shown in Fig. 1.
The support vector machine consists of three parts: input layer, intermediate inner product kernel function layer, and output layer.The water source discriminant X 1 , X 2 , X 3 , • • • , X n , which represents the sample feature information, is input into the Support Vector Machine model, and the input variables will be processed by the intermediate inner product kernel function layer to map them into the high-dimensional space to seek the optimal solution.This does not consider the specific mapping relationship in the transformation stretching process, and the discriminant type of the water source is finally output in the output layer after a nonlinear transformation 20 .
The procedure of SVM classification operation is as follows 21,22 : ① Determine the input sample variable as {x i } ⊂ X = R n , the output variable as y i ∈ Y = {1, −1}.② Select the optimal combination of parameters, where the kernel function is a i according to the constraints.
After dimensioning, assuming a nonlinear mapping ϕ : R d → H , the optimization problem can be trans- formed into: Introducing Lagrange multipliers yields: The pairwise objective function is: is a kernel function that implicitly maps the data and then learns it.To obtain the classification decision function: The soft interval with the introduction of the penalty factor C and the relaxation variable ξ i (ξ i > 0) is opti- mized as: The optimal decision function can be obtained as:

Optimal parameter solving
In this paper, the grid search method is chosen to divide the grid for the optimal search.Using the fixed-step grid search search 23 , a violent search method with a combination of coarse and fine, and a large step size in the (10)   www.nature.com/scientificreports/optimization search space, all the real target points to be searched are cyclically arranged and combined, and the value range of c and g are set to [2-10].The process and principle of the optimization search are shown in Fig. 2. The support vector machine steps for the optimization of the grid search method are as follows 24,25 : (1) Create a coordinate grid Set up the training learner, pick the step size L, put in the parameter search range, and the grid parameter node c = 2X,g = 2Y.
(2) Using K-fold to find the classification accuracy The samples are divided into N subsets, including the test set and the training set, and the number of subsets is 1 and N-1, respectively, where the training set is used for model building.The accuracy evaluation method is set to obtain the classification accuracy corresponding to the set of parameters, which is used for the training set.(3) Traversing the coordinate grid The combination with the smallest mean square error among all the traversed parameters is selected to obtain the optimal trainer, that is, the combination of (c, g) with the highest classification accuracy, and the optimal trainer accuracy is output.

Hydrogeologic conditions in the study area
The coal seams in the Zhaogezhuang Coal mine are predominantly distributed within the Upper Taiyuan Formation (Zhaoge Formation) of the Shanxi Formation (Da Miaozhuang Formation).The presence of faults on the eastern, western, southern, and northern boundaries has resulted in the uplift and exposure of the Ordovician limestone due to tectonic activity.This faulting has led to the development of intense structural karst.Consequently, the gently inclined limestone has formed troughs, and a robust karst development zone has emerged along the eastern boundary fault of the Kaiping block.The overlying Quaternary loose layers exhibit coarse particle size, exceptional permeability, and high water content, serving as a prominent conduit for groundwater movement and constituting the primary strong runoff zone in the regional groundwater system.The hydrodynamic forces are notably strong, displaying characteristics of concentrated conduit flow.Furthermore, a portion of the groundwater in the eastern part of the Shahe River basin in the Zhaogezhuang mine infiltrates the field's interior through the Leizhuang fault, with groundwater flowing from the northeast to the southwest.The Zhaogezhuang Coal Mine has developed five major aquifer systems from the Cambrian to the Quaternary: the Cambrian aquifer, the Ordovician limestone aquifer, the coal-bearing formation sandstone aquifer, the Tangshan limestone aquifer, and the Quaternary alluvial aquifer.The Quaternary alluvial aquifer in the study area exhibits a relatively thin structure, exerting minimal impact on coal mining operations.In contrast, the Cambrian aquifer predominantly interacts with the Ordovician aquifer.Consequently, the Ordovician aquifer assumes a www.nature.com/scientificreports/pivotal role in water influx incidents within the study area, particularly in cases of deep water influx.The principal contributors to these occurrences are the aquifers comprising Ordovician limestone and coal-bearing sandstone within the coal-bearing rock series.To maximize differentiation of water source types, the study selected the six most widely distributed ions in groundwater as discriminative indexes 26,27 .These include Na + , Ca 2+ , Mg 2+ , Cl -, SO 4 2-and HCO 3 -.K + was combined with Na + due to their low variation range.

Data index extraction and collection
For data selection, the Zhaogezhuang mine's deep mining process was primarily threatened by Ordovician carbonate from the Ordovician aquifer, followed by goaf water damage and sandstone water damage.As a result, four water sample types were chosen: goaf water (from the I aquifer), ordovician carbonate (from the II aquifer), sandstone fracture water from the 13 coal system (from the III aquifer), and sandstone fracture water from the 12 coal seam (from the IV aquifer section A).To screen the typical water sample data, 67 groups were selected from 19 boreholes based on the anion and cation balance test and hydrogeological data of Zhaogezhuang.Among these groups, 18 were from goaf water, 13 from ordovician carbonate, 17 from 13 coal seam sandstone fracture water, and 19 from 12 coal seam sandstone fracture water.The four water sample sources are indicated by I, II, III, and IV respectively.The water samples were submitted to the Testing and Analysis Center of Hebei Coalfield Geology Bureau for chemical analysis.The water quality testing report provided analysis of the main ions and the total hardness (TH) using ion chromatography.Additionally, the bicarbonate ion (HCO 3 -) and total alkalinity (TA) were determined through titration using dilute sulfuric acid-methyl orange.The pH value was measured using a pH tester.Subsequently, the data on the nine discriminant indices of the mine water were organized and presented in Table 1(attached).
Using 67 sets of typical water sample data collected from the Zhaogezhuang mining area, 56 of these were utilized as training samples for the learning machine as shown in Table 2(attached) while the remaining 11 sets were reserved as test samples, labeled G1 to G11 as presented in Table 3.The distribution of anion and cation content was illustrated using a three-dimensional diagram, with the cation content distribution depicted in Fig. 3, and the anion content distribution shown in Fig. 4.

Analysis of statistical characteristic values
The water chemistry statistical characteristic values were calculated and analyzed based on the water chemistry content information of 67 groups of water samples from Zhaogezhuang mine.In the water sample data of study area, the goafwater is obviously different from the other three types of water samples in ionic composition.Among the anions of the goaf water, the anion with the highest content is SO 4 2-, which is 78.022 mmol•L -1 , while the other water samples are HCO 3 -.The goaf water is easier to identify than the other three types of water sources, and can be identified by the content of anions, if the highest content of SO 4 2-can be initially classified as goaf water; in the cations, the highest content in all four types of water samples is Ca 2+ .In addition, in terms of the overall content of anions and cations in all water samples data, the content of Ca 2+ and HCO 3 -is higher compared to other ions, which indicates that Ca 2+ and HCO 3 -have strong recognition ability.
The goaf water.The hydrochemical index of goaf water are as shown in Table 4.The water chemical composition of the four water samples from Zhaogezhuang differed significantly, and their mass concentrations of substances were related to the water source cycle.In the goaf water, the mass concentration of SO 4 2-was the highest in the distribution of anion content, and its substance concentration ranged from 60.47 mmol•L -1 to 85.55 mmol•L -1 , accounting for 78% of the anions, followed by HCO 3 -.Cl -had the smallest mass concentration.The cations were mainly Ca 2+ and Mg 2+ , and the lowest mass concentration of Na + .The coefficient of variation is the ratio of the standard deviation to the mean, indicating the degree of dispersion of the data, and the Cl -coefficient of variation was the largest at 0.9, followed by Na + at 0.41, and the rest were smaller, indicating the poor uniformity of ion concentration in the water.
Ordovician carbonate.The hydrochemical index of Ordovician Carbonate are as shown in Table 5.The ph of ordovician carbonate is 7.30-7.94,which is weakly alkaline.86.6% of the anions in ordovician carbonate are mainly HCO 3 -and SO4 2-, and the mass concentration of cations are: Ca 2+ > Mg 2+ > Na + , mainly Ca 2+ and Mg 2+ accounting for 92.88%, and the water chemistry type is Ca-Mg-HCO 3 .The variation coefficient of ordovician carbonate is in the following order: SO 4 2-> Cl -> Na + > Mg 2+ > HCO 3 -> Ca 2+ , and the coefficients of variation of all six indexes are less than 0.5.and the coefficients of variation of the anions Cl -, SO 4 2-, HCO 3 -is greater than that of cations Na + , Mg 2+ , Ca 2+ .Sandstone fracture water from the 13 coal system.The hydrochemical index of sandstone fracture water from 13 coal system are as shown in Table 6.The highest mass concentration of HCO 3 -among the anions in the fracture water of the 13-coal sandstone is up to 79.58 mmol•L -1 , the content of SO 4 2-and Cl -is less, and the highest mass concentration of cations is Ca 2+ , followed by Mg 2+ .The 13 coal system sandstone fracture water coefficient of variation is not much different except for Na + , which is less than 0.1, and the ion concentration is dispersed more uniformly.Sandstone fracture water from the 12 coal system.The anions in the fracture water of the 12 coal seam sandstone are mainly HCO 3 with a mean mass concentration of 71.79 mmol•L -1 .The cations are dominated by Ca 2+ up to 64.36 mmol•L -1 , followed by Mg 2+ with a mean concentration of 32.57 and finally Na + .The variation coefficients , and the variation coefficient of Mg 2+ is as high as 0.69.
The hydrochemical index of sandstone fracture water from 12 coal system are as shown in Table 7.In order to study the hydraulic connection between individual aquifers, the degree of connection K between them can be calculated quantitatively 28,29 , and since the Cl -concentration is minimally disturbed by other factors and is mainly influenced by the formation itself, the degree of hydraulic connection between two aquifers can be obtained by calculating the difference between their average Cl -concentrations .If the K value of the hydraulic connection between the two aquifers is less than 0.2, it means that they have a strong hydraulic connection, if K is greater than 0.4, it means that the hydraulic connection between the two aquifers here is weak, if the final calculated K value is between 0.2 and 0.4, it means that the hydraulic connection is moderately strong 30,31 .
Cl 1 The average Cl -concentration in aquifer 1. Cl 2 The average Cl -concentration in aquifer 2. Through Eq. ( 16), the K values of goaf water and Ordovician carbonate, sandstone fracture water of 13 coal system and sandstone fracture water of 12 coal system are all 0.25, and the degree of hydraulic connection is moderate.The K value of the hydraulic connection between the goaf water and the sandstone fracture water of 13 coal system is 0.025, and the K value of the fracture water with the 12 coal seam sandstone is 0.03, which is a weak hydraulic connection; the K value of the fracture water with the 13 coal system sandstone and the 12 coal seam sandstone fracture water is 0.001, which is a very weak hydraulic connection.It can be summarized that there is a certain hydraulic connection between the goaf water and other aquifers, indicating the existence of connection and increasing the difficulty of discrimination.

Piper trilinear diagram analysis
The hydrogeological conditions in Zhaogezhuang Coal Mine are characterized by complexity and variability.As demonstrated by the previous analysis of the goaf water composition and other water sources, they exhibit distinguishable differences.To further investigate the distribution patterns of aquifer water samples, the Piper trilinear diagram method was employed for analysis.The ion contents were represented as points on the diagram, allowing for inference of the water chemistry type and quality pattern of the aquifer based on the scatter position of the water samples.
The water samples of the study area were drawn for hydrochemistry analysis using piper trilinear diagram shown in Fig. 5.The goaf water was located in the upper right corner, near Ca 2+ , Mg 2+ and SO 4 2-, Cl -, mainly Ca•Mg-Cl•SO 4 type, and individually Ca•Mg-SO 4 type.The water sample of Ordovician carbonate water is located in the left position of the diamond-shaped area, and the water quality type is Ca•Mg-HCO3 type.By observing the left triangle area, we can find that the cations in the Ordovician carbonate sample are mainly Mg 2+ and Ca 2+ , and the anions are mainly HCO 3 -and SO 4 2-in the right triangle area.Sandstone fracture water from the 13 coal system is located in the middle and left position, and the cations are mainly located in Ca 2+ and The anions are scattered in the end elements with high proportion of HCO 3 -and SO 4 2-, and the water quality type is Ca•Mg-HCO 3 type.sandstone fracture water samples from the 13 coal system are highly similar to the 13 in the trilinear diagram, and the water chemistry type is Ca•Mg-HCO 3 type, the cations are mainly Ca 2+ and Mg 2+ , and the anions are mainly HCO 3 -and CO 3 2-.In summary, the water quality types of Ordovician carbonate, sandstone fissure water from 13 or 12 coal seam are the same, with overlapping characteristics and inconspicuous distribution boundaries, which need further quantitative discrimination.( 16) .

Dimensionality reduction based on R-factor
The normalization process is performed before the operation to make it lie in the interval of [0, 1] to solve the comparability between indicators and ensure the stability of calculation.The normalization of water sample data are as shown in Table 8

(attached).
There is a non-linear association between the indicators, and to reduce the correlation between the data, the optimal number of common factors for the six indicators of sodium ion, calcium ion, magnesium ion, chloride ion, sulfate ion, and bicarbonate ion was determined to be 3, denoted as Y1, Y2, and Y3.SPSS software was used to analyze 67 groups of samples and 6 evaluation indicators of Zhaogezhuang based on the correlation calculation steps of R-type factors.The eigenvalues and contribution rates of the main factors were as Table 9.
The cumulative contribution rate of the first three principal factors reaches 96.660%, which indicates that the factors extracted by dimensionality reduction contain 96.660% of the information of the original index data.When the cumulative contribution rate reaches 80%, it shows that the extracted principal factors are reasonable and effective, which indicates that these three principal factors cover most of the water chemistry information and can effectively replace the original indexes.
The factor correlation matrix is as follows:  The correlation coefficient above 0.8 indicates a strong correlation, while between 0.3 and 0.8 indicates a moderate correlation, and below 0.3 indicates no correlation.The correlation coefficient between Na + and Ca 2+ is − 0.416, indicating a weak correlation, while with Mg 2+ is − 0.167, with Cl -is 0.231, with SO 4 2-is − 0.104, and with HCO 3 -is 0.080, all of which have no correlation.The correlation coefficient between Ca 2+ and Mg 2+ is − 0.799, indicating weak correlation between Ca 2+ and other ions.Similarly, Mg 2+ is not correlated with Na + and weakly correlated with other ions, while Cl -and SO 4 2-are strongly correlated and SO 4 2-and HCO 3 -are strongly correlated.
Using the maximum variance orthogonal rotation method, SPSS rotates to obtain the rotated component matrices.The factor loading matrix and the rotated component matrix were:   Three new main components Y 1 , Y 2 , and Y 3 were extracted, and the factor score coefficient matrix based on SPSS operations was as follows: According to the factor score coefficient matrix, the expressions of the main factors Y 1 , Y 2 , and Y 3 are: The original data of water samples (I), water samples (II), water samples (III), and water samples (IV) from Zhaogezhuang mine were substituted into the model expressions of the three main factors Y 1 , Y 2 , and Y 3 , and the factor score matrices were as follows: www.nature.com/scientificreports/

R-SVM model establishment
The R-SVM model is shown in Fig. 6.First, the R-factor is used to initially reduce the dimensionality of the data, and the three common factors Y 1 , Y 2 , and Y 3 are used as the input variables of the model, and the four types of water sources H are used as the output of the model to establish the mapping F(Y1,Y2,Y3) → H , which automatically searches for complex connections between the input variables and the types of water sources.The grid search method is used to find the optimal combination of parameters for the Support Vector Machine model.The training set data is then used to train the model, and the trained model is used to predict the water sample types for the testing set data.The predicted types are then compared with the actual types to correct for any deviations.This process is repeated until the model achieves a satisfactory level of accuracy in predicting the types of water samples.

Parameter search and model application
Six indicators of sodium ion, calcium ion, magnesium ion, chloride ion, sulfate ion and bicarbonate ion are used as input variables of the SVM, and four water source types of goaf water, Ordovician carbonate, sandstone fracture water from the 13 coal system and sandstone fracture water from the 12 coal system are used as outputs of the model to establish the mapping relationship between the two and seek the nonlinear law of the two by SVM.Firstly, 55 sets of training samples and 11 sets of prediction samples are substituted into the grid search method to run the search for parameters, and the range of values of the parameters c and g of the grid search method are set g ∈ 2 −10 , 2 10 c ∈ 2 −10 , 2 10 , and the step size L = 0.2 according to the operation process of SVM.The three public factors of Zhaogezhuang after dimensionality reduction were used as the input variables of the model, and four types of goaf water, Ordovician carbonate, sandstone fracture water from the 13 coal system, and sandstone fracture water from the 12 coal system of Zhaogezhuang mine were used as the outputs of the model to establish the mapping relationship about the public factors and water source types.The factor scores of the 67 sets of sample data after dimensionality reduction were substituted into the SVM model of grid search method for finding the best model for training, and the best parameter combination c = 1 and g = 2.8284 was finally obtained.The result of the optimization search is shown in Fig. 7 Substituting c = 1 and g = 2.8284 into the SVM model, the type attributes were predicted for 11 sets of data to be discriminated, and the final results are shown in Fig. 8 and Table 10.The model misjudged Type II ordovician      carbonate as Type III sandstone fracture water from the 13 coal system, indicating that the model is suitable for water source discrimination in Zhaogezhuang Coal Mine and can effectively make the distinction.
Table 11 presents a comparative analysis of model performance across different optimization types.The accuracy and precision metrics were employed to evaluate the models' efficacy.The Fisher optimization type exhibits the lowest performance in terms of accuracy and precision.The Grid optimization type shows a significant improvement in both accuracy and precision compared to the Fisher type.Notably, the R-type grid optimization type demonstrates the highest level of performance, surpassing both the Fisher and Grid types in terms of accuracy and precision.
Based on the information provided, it seems that the coupled discriminant model of R-SVM was able to provide more targeted and effective characterization of water sources compared to other multi-model prediction results presented in Table 11.The R-factor simplification was used as a new discriminant to improve the model's independence component.The coupled discriminant model of R-SVM can also complement the qualitative analysis of water chemistry and provide rapid identification of water sources.

Conclusion
As coal mine of submarine mining, the identification and prediction of mine water inrush source is of great significance to the safety and efficiency of mine production in Zhaogezhuang Coal Mine.In order to prevent and control the water inrush, it is of great practical significance to identify the mine water source effectively and accurately.Through the analysis of the water source data of different parts in the mine, the effective water source discrimination model was established to verify its effectiveness and practicability.The conclusions of the study are as follows:

Table 1 .
67 groups of water chemistry data.

Table 2 .
Training sample data.

Table 3 .
Forecast sample data.

Table 6 .
Hydrochemical index of sandstone fracture water from the 13 coal system.

Table 7 .
Hydrochemical index of sandstone fracture water from the 12 coal system.

Table 8 .
Normalization of water sample data.

Table 9 .
Characteristic values and contribution rates of main factors.

Table 10 .
Comparison of model operation results.

Table 11 .
Comparison of model performance.